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Abstract 

In compositional data analysis an observation is a vector containing non-negative values, only 
the relative sizes of which are considered to be of interest. Without loss of generality, a com¬ 
positional vector can be taken to be a vector of proportions that sum to one. Data of this type 
arise in many areas including geology, archaeology, biology, economics and political science. 
In this paper we investigate methods for classification of compositional data. Our approach 
centres on the idea of using the a-transformation to transform the data and then to classify the 
transformed data via regularised discriminant analysis and the ^-nearest neighbours algorithm. 
Using the a-transformation generalises two rival approaches in compositional data analysis, one 
(when a=l) that treats the data as though they were Euclidean, ignoring the compositional 
constraint, and another (when a = 0) that employs Aitchison’s centred log-ratio transforma¬ 
tion. A numerical study with several real datasets shows that whether using a = 1 or a = 0 
gives better classification performance depends on the dataset, and moreover that using an in¬ 
termediate value of a can sometimes give better performance than using either 1 or 0. 

Keywords: compositional data, classification, a-transformation, a-metric, Jensen-Shannon 
divergence 


1 Introduction 


Compositional data arise commonly in many fields, for instan ce geo logy (Aitchison, 1984), 
in studying constitution o f rock samples; economics (|Frv et all [2000j), in budget allocations; 
archaeology (Baxter et all 2005), in the constitution of man-made glasses; and the political 
sciences (R odri gues and Lima! . 2009), in voting behaviour. In compositional data analysis, a 
composition is considered an equivalence class comprising the set of multivariate vectors that 
differ only by a scalar factor and have non-negative components. Consequently, without loss 
of generality, an observation may be viewed as a vector of proportions, i.e., with non-negative 
components constrained to sum to 1. The sample space of the observations is hence the simplex 


S d = { (xi, ...,x D ) J 


D 


Xi > 


o.E 


Xi = 1 


i— 1 


where D denotes the number of components of the vector and d = D — 1. 

For statistical analysis of compositional data the question of how to account for the com¬ 
positional constraint arises. A simple approach is to ignore the compositional constraint and 
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treat the data as though they were Euclidean, an approach we will ca l l “Euclidean data ana l 
ysis” (EDA) (Baxter . 


2001 


Baxter et al 


2005 : 


Baxter and Freestone. 


2006 ; 


Woronow. 


1997 ). 


There is a school of thought, however, largely following from the work of Aitchison (1982, 


1983 


1992)5 that ignoring the compositional constraint is inappropriate and can lead to mis¬ 


leading inferences. Aitchison contended that data should instead be analysed after applying 
a “logratio” transformation, arguing that this amounted to working with an implied distance 
measure on the simplex (discussed further in the next section) that satisfied particular mathe¬ 
matical properties he regarded as essential for compositional data analysis. Other approaches 
to compositional data analysis that we mention here but do not consider further in this paper 


Scealv and Welsh, 

2011 

), and parametric modelling, for example using the Dirichlet distribution 

(Gueoreuieva et al 

2008). 


Both EDA and Aitchison’s logratio analysis (LRA) approach are widely used and there has 
been a long and ongoing disagreement over which of these approaches, or indeed others, is 
most appropriate to use. The debate remains largely centred on the distance measures implied 
by the various app roaches and whether or not they satisfy particular mathematical properties. 


Sceal v and Welsh (2014) have recently presented a historical summary of the debate, and have 


given a critical appraisal of the properties often invoked by authors to support the use of LRA. 
We share Scealy and Welsh’s opinion that LRA should not be a default choice for compositional 
data analysis on account of such properties. In this paper, we take the pragmatic view, which 
seems especially relevant for classification problems (in which out-of-sample classification error 
rate provides an objective measure of performance), that we should adopt whichever approach 
performs best in a given setting. 

Indeed, a key message of this paper is that for classification problems, the choice of whether 
or not one should transform the data, and if so which transformation to use, should depend on 
the dataset under study. This conclusion is clear from the fact that we can easily generate a 
synthetic dataset for which LRA will perform perfectly and EDA poorly, and vice versa. 

One characteristic of a dataset that immediately rules out using LRA in its standard form 
is the presence of observations for which one or more components is zero, since for such obser¬ 
vations the logratio transformation is undefined. Data of this type are not uncommon (in 10] we 
consider two datasets containing observations with zeros), so this is a notable weakness of LRA. 
Some attempts have been made to modify LRA to make it appropriate for data containing 
zeros (particularly when the zeros are assumed to arise from rounding error), but these involve 
a somewh at ad h oc imputat i on approach of replacing zeros with small values. On a differ¬ 


ent tack, 


Butler and Glasbev (2008) developed parametric models specifically for compositional 


data with zeros. 

Perhaps due to the nature of the data, little attention has been given to the problem of class i¬ 
fying compositional d ata, e specially where zero values are present. Exceptions are 
(1201 01 and 


Zadora et al 


Neocleous et al. 


(|201ll l. who conside r classification using parametric models to ac¬ 
count for the possibility of zeros values; see also Palarea-Albaladeio et ah) (2 0051 ) who consider 
the related problem of cluster analysis. Our goal in this paper is to develop adaptive clas¬ 
sification algorithms which take into account the characteristics of individual datasets, such 
as the distribution of the groups and the presence of zeros. The main idea is to employ the 
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Box-Cox-type a-transformation explored in lTsagris et ah (2011), and then use the transformed 
data as a basis for classification. This transformation has a free parameter, a, and is such that 
the case a = 0 corresponds to the logratio transformation, and a = 1 corresponds to a linear 
transformation of the data. Hence using a = 0 corresponds to LRA, and a = 1, when used in 
conjunction with the discriminant analysis and nearest-neighbour classification algorithms that 
we consider in (}3j is equivalent to EDA. For values of a between 0 and 1, the a-transformation 
offers a compromise between LRA and EDA. An important benefit of the a-transformation is 
that it is well-defined for any a > 0 for compositions containing zeros. 

The paper is structured as follows. In §2 we discuss in more detail the a-transformation and 
the logratio transformation, and the associated implied distance measures, and then in §3 we 
consider some classification techniques and how their performance can be improved using the 
a-transformation. In §4 we present the results of a numerical study with four real datasets to 
investigate the performance of the various techniques. We conclude in §5 with a discussion of 
the results. 


2 The a-transformation and implied simplicial distance mea¬ 
sure 


The a-transformation of a compositional vector x E S d (see Tsagris et ajJ ( 20111 )) is defined by 


z a (x) = H 


Du a (x) - 1 D 


a 


with a > 0 (we discuss more general a below), and where 


(1) 


Ua(x) = 


rr a 

X 1 


X 


D 


r a 




(2) 


is the compositional power transformation (Aitchison, 2003), Id is the D-dimensional vector of 
ones, and H is any d-by-D matrix consisting of orthonormal rows, each of which is orthogonal 
to Id] similar ideas have been used in the compositional data context by Egozque et al. (2003) 
and, of course, in many other contexts. A suitable choice for H (noting in any case that the 
classification methods in this paper are invar iant to the particular choice) is the Helmert matrix 


(Lancaster, 1965; 
jth row is 


Drvden and Mard ia 


19981 1 with the first row removed, i.e., the matrix whose 
(hj,... ,hj, —jhj ,0,...,0), where hj = - {j{j + 1)}~ 1/2 , (3) 


with hj repeated j times and 0 repeated d — j times. The purpose of H is to remove the 
redundant dimension which is present due to the compositional constraint. In particular, the 
vector (D u a (x) — Id) /a has components which sum to zero and therefore it lies in a subspace 
of left-multiplication by H is an isometric one-to-one mapping from this subspace into M. d . 
The image V a = {z a (x) : x £ § d } of transformation (HD is M d in the limit a —>• 0 but a strict 


3 
























subset of for a/0. Transformation dTj) is invertible: for v £ V a the inverse of z a (x) is 


where 


Z a 1 ( v ) = u c/ (aH T v + 1 £>) € 


u 


-1 


(x) 


VEjLr^' 


4 /Q ^ 


(4) 

(5) 


If one is willing to exclude from the sample space the boundary of the simplex, which corresponds 
to observations that have one or more components equal to zero, then the a-transformation © 
and its inverse © are well defined for all a € M. (Excluding the boundary is standard practise 
in LRA because the definition is used to sidestep the problem of having data with zeros.) The 
motivation for transformation © is that the case a = 0 corresponds to LRA, whereas a = 1 
corresponds to EDA. We define the case a = 0 in terms of the limit a —> 0; then 


zo(x) = lim z Q (x) = H • w(x), 
o 


( 6 ) 


where 


w(x) = log 


x\ 

fl’(x) 


..., log 


Xp 

0(x) 


(7) 


is Aitchison’s centred logratio transformation (Aitchison, 1983, 2003) and g (x) = n^=i x i' 1S 
the geometric mean of the components of x. See the Appendix for proof of ©. For the case 
a = 1, © is just a linear transformation of the simplex. 


1 /D 


Power transformations similar to © were considered by Greenac re (200Q|) and 


Greenacre 


(2011), in the somewhat different context of correspondence analysis. A Box-Cox transforma¬ 
tion applied to each component of x £ S d so that x is transformed to 



( 8 ) 


has the limit (logxi,... ,logxp) T as 0 —» 0. We favour transformation ([T]) in this work in view 
of its closer connection, via (©, to Aitchison’s centred logratio transformation. 

The cc-transformation © leads to a natural simplicial distance measure A Q (x, y), which we 
call the a-metric, between observations x, y € § rf , defined in terms of the Euclidean distance 
|| • || between transformed observations, i.e., 


A q (x, y) 


IIM 

x) -z Q (y)|| 

D 

1 _ 

a| 

_i=l \^i=l 


1 1/2 


E?=i y? 


(9) 
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The special case 


A 0 (x,y) := lim A a (x,y) = 

a —>0 


D 


_i= 1 


£ log 


Xi 


9(x) 


- log 


Vi 

a( y) 


1/2 


( 10 ) 


is Aitchison’s distance measure (jAitchison et ajJ, _2000), whereas 

1 1/2 


A! (x, y) = D 


D 


_ ViY 

_i —1 


(ii) 


is just Euclidean distance multiplied by D. 

Transformation and the implied distance measure ([9|), offer flexibility in data analysis: 
the choice of a enables either LRA or EDA, or a compromise between the two, and the particular 
value of a can be chosen to optimise some measure of practical performance (in this paper, the 
out-of-sample classification error rate). Crucially, for a > 0, the transformation and distance 
are well defined even when some components have zero values, in contra st to ( |7l) an d_(ll0lh 

Amongst the criteria for compositional distance measures listed by Aitchison (119921 1. the 
distance measure ([9]) satisfies “positivity” (A a (x, y) > 0 for x / y), “zero difference for equiva¬ 
lent compositions” (A a (x, x) = 0), “interchangeability of compositions” (A a (x, y) = A a (y,x)), 
“scale invariance” (A a (ax, Ay) = A a (x, y) for all a > 0, A > 0) and “permutation invariance” 
(A Q (Px, Py) = A„(x, y) for any permutation P). It does not satisf y ^perturbation invariance”, 
a property strongly tied to the logratio transformation (Aitchison, 2003j); and nor does it sat¬ 
isfy “subcompositional coherence”, a criterion that_affects inferences regarding the relationships 
between compositional components (Greenacre, 2011). The question of how much importance 
should be given to subcompositional coherence in compositional data analysis has been a matter 
of much debate; see for example the historical review and discussion in Scealy and Welsh (2014). 
Our view is similar to that of Scealy and Welsh (2014), which is that subcompositional domi¬ 
nance is not a property of primary importance, although we point out that a referee strongly 
disagrees with our position. We reiterate that our motivation is to achieve strong practical 
performance, whether or not our distance measure satisfies any particular properties. 


3 Classification techniques for compositional data 

The key idea now is to use the a-transformation m in conjunction with regularised descrim- 
inant analysis (RDA), and the a-metric Q in conjunction with fc-nearest-neighbours (fc-NN) 
classification, to investigate how performance for various values of a compares with the special 
cases of EDA (a = 1) and LRA (a = 0). We will begin with a brief review of regularised 
discriminant analysis, of which linear and quadratic discriminant analysis are special cases, and 
with the fc-nearest neighbours algorithm. 

3.1 Regularised discriminant analysis (RDA) 

In discriminant analysis we allocate an observation to the group with the highest (posterior) 
density, assuming that observations in each group come from a multivariate normal distribu- 
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tion. Given a training sample with g groups containing m,...,n g observations, then a new 
observation z € is classified to the group for which the discriminant score, <5,( z), is largest, 
where 


Si (z) 



27rSj 



A i) T ^i 1 ( z ~ Ai) + log vrj; 


( 12 ) 


here | ■ | denotes determinant, 77 = rii/n with n = Ylf=i n o an d the Ai and a,re the sample 
mean vector and covariance matrix, respectively, of the ith group. Equation (1121) is the Bayesian 
version of discriminant analysis, incorporating the prior group membership probabilities 7 r = 
( 7 Ti,..., TT g ), which assumes that the proportions of observations in the training sample are 
representative of the proportions in the population. Other choices of 7v are possible depending 
on available prior information. The frequentist version uses instead 77 = 1/g. We will use the 
Bayesian version with 77 = rii/n in our numerical investigations in 2J 

The boundary between classification regions, say between groups i and j, is defined by 
5i( z) = 5j( z). From m, the boundaries are hence quadratic in z, and for this reason the ap¬ 
proach is termed quadratic discriminant analysis (QDA). If we make the simplifying assumption 
that the groups share a common covariance matrix, then the X,; in (1121) can be replaced with 
the pooled estimate 


X p — 


Ef=i fo-i)Si 

n-g 


In this case, the boundaries are linear, and the approach is hence termed linear discriminant 
analysis (LDA). 

QDA and LDA are special cases of so-called regularised discriminant analysis (RDA); see 
(2001, pp. 112-113). The idea of RDA is to regularise the covariance matrices by 


Hastie et al. 


replacing them with weighted averages 


('V 7) — ^i + (1 ~ ^) (7) > 

and X p ( 7 ) = 7 X P + (1 - 7 ) tr (tlp^jl/d, 

where A, 7 € [0,1] are two free parameters and I is the d-by-d identity matrix. Parameter A 
offers a trade-off between the more flexibile classification boundaries of QDA and the greater 
stability of LDA to one or more of the X* being ill-conditioned. Parameter 7 offers further 
stability if the pooled estimate X p is itself ill-conditioned. Choosing A = 1 gives QDA, whereas 
choosing A = 0 and 7 = 1 gives LDA. 

We propose to use RDA with data transformed using the a-transformation ©. and will 
denote this by RDA(a, A, 7 ). Hence, RDA(0, A, 7 ) amounts to the LRA approach of applying 
RDA to data transformed using the isometric log-ratio transformation Q, whereas RDA(1, A, 7 ) 
amounts to the EDA approach of applying RDA to untransformed data. We will also use the 
notation QDA(a) = RDA(a, 1,0) and LDA(a) = RDA(a,0,1). 


3.2 L-nearest neighbours (A>NN) 

The £;-NN algorithm is an intuitive classifier that assumes no parametric model. It involves 
determining the k observations in the training sample that are closest, by some choice of distance 
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measures, to the new test observation, then allocating the test observation to the group most 
common amongst these k “nearest neighbours”. Ties caused by two or more groups jointly 
being most common can be broken by allocating uniformly at random amongst the tied groups 
(the strategy we use in our numerical examples in )§]) or else by using a secondary tie-breaking 
criterion. 

Performance of k -NN depends of the choice of k: small k allows for classification boundaries 
which are flexible but which have a tendency to overfit, with the opposites true when k is large. 
It also depends on the choice of distance measure. Since we are dealing with compositional data 
we shall use the a-metric @, denoting such an approach fe-NN(a), so fc-NN(O) indicates the 
LRA approach of using k -NN with Aitchison’s distance (11011 . while fc-NN(l) indicates the EDA 
approach of using k -NN based on Euclidean distance. 

We can equally easily use any of many other possible distance measures. For sake of com¬ 
paring performance with the a-metric we also consider one alternative, namely the following 
variant of the Jensen-Shannon divergence: 


ESOV(x,y) = 


\ 


D 


i =1 


Y ( Xi log 


237 , 2 yi 

+ Vi log - 


Xi + Vi 


Vi 


(14) 


We use the notation ESOV after 


Endres and Schindelinl (2003) and Osterreicher and Vaida 


(2003) who independently proved that (11411 satisfies the triangle inequality and thus is a metric. 
As with the a-metric ©, the ESOV metric (USD is well defined even when zero values are 
present. We denote the fc-NN classifier based on metric (fTTl) by /c-NNesov- 


4 Applications of compositional classification 

We will show four examples of applications of the proposed compositional discrimination tech¬ 
niques. In all cases we used real data sets, two of them having observations with zero values in 
some of the components, and the other two data sets having no zero values. The two bench¬ 
marks for comparison will be when a = 0, which results in LRA, and when a = 1, which results 
in EDA. 

We performed RDA(a, A, 7 ), fc-NN(a), varying a in steps of 0.05 between -1 and 1 for 
datasets without zeros and between 0.05 and 1 for datasets with zeros (since in such circum¬ 
stances the a-transformation and a-metric are not defined for a < 0 ), and varying the values 
of A and 7 in steps of 0.1 between 0 and 1. 

To estimate the rate of correct classification in out-of-sample prediction we used cross vali¬ 
dation. This involves dividing the set of n observations into training and test sets of size nt ra in 
and ntest respectively, training the classifier on the training set, then evaluating its prediction 
accuracy of the test set. In view of the samples having groups with quite variable numbers of 
observations we used stratified random sampling to ensure that the training sets were repre¬ 
sentative of the test sets, and to arrange that all groups were represented in the test set. In 
particular, we randomly divided the samples into training and test sets so that 

Hi ^ train ^ ^j,test 
H Strain ^test 
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where rii, train and test are the sample sizes of the zth group in the full, training and test 
samples, respectively. We then estimated the rate of correct classification by 

c 

q = -, 

ntest 

where c is the number of observations in the test set correctly classified and nt es t is the test 
sample size. 

For each of the classifiers RDA(a, A, 7 ) and k- NN(a), the steps can be summarised as follows 

Step 1. Partition the sample into training and test sets using stratified random sampling. 

Step 2. For each combination of values of the free parameters (a, A, 7 for RDA; a, k for fc-NN(a); 
train the classifier on the training set. 

Step 3. Apply the classifiers to the test set, and calculate q in (fT5lh 

Step 4. Repeat steps 1 — 3 a large number, say B , times, then estimate the rate of correct classi¬ 
fication as the average of the qs in Step 3. 

For the calculations in the following section we took B = 200 which gave estimates of the 
rate of correct classification with small standard errors at reasonable computational cost. 



4.1 Examples 

We will now introduce four datasets to investigate the performance of the supervised classi¬ 
fication techniques described in 1J3) The datasets come from different fields, namely ecology, 
forensic science, hydrochemistry and economics. 


Example 1: Fatty acid signature data (contains zero values) 

This is a dataset described in ( Stewart and Field! . 2011) (itself an updated version of a dataset 


from (Iverson et al 


20041 )) which contains observations of n = 2110 fish of g = 28 different 


species, each observation being a composition with D = 40 components that characterises the 
fatty acid signature of the fish. A special feature of this dataset is that it contains many 
zero values (3506 components, across all observations, are zero) which rules out use of the log- 
ratio transformation (ED- Table [T] shows the number of observations in each group, and the 
number of observations for which at least one component is zero. Table [2] shows the proportion 
of observations which have zeros in each of the components. For this example, for the cross 
validation we used a test set of nt es t = 165 observations (7.8% of the full sample). 


Example 2: Forensic glass data (contains zero values) 


In the second example we use the forensic glass dataset (JUC Irvine Machin e Le arn ing Rep ositor y. 
2014) which has n = 214 observations from g = 6 different categories of glass, where each obser¬ 
vation is a composition with D = 8 components. The categories which occur are: containers (13 
observations, 12 of which have at least one zero element), vehicle headlamps (29 observations, 
all with at least one zero value), tableware (9 observations, all with at least one zero value), 










Species 

Sample size 

Species 

Sample size 

Species 

Sample size 

Butterfish 

35(30) 

Mackerel 

34(23) 

Snake Blenny 

18(12) 

Capelin 

165(145) 

Ocean Pout 

31(31) 

Squid 

18(17) 

Cod 

147(121) 

Plaice 

148(120) 

Thorny Skate 

74(74) 

Gaspereau 

70(69) 

Pollock 

57(49) 

Turbot 

20 (20) 

Haddock 

148(134) 

Red Hake 

25(24) 

White Hake 

75(71) 

Halibut 

13(11) 

Redfish 

84(74) 

White Flounder 

90(80) 

Herring 

247(231) 

Sandlance 

124(101) 

Winter Skate 

40(39) 

Lobster 

21 (21) 

Shrimp 

122(87) 

Witch Flounder 

24(24) 

Longhorn Sculpin 
Lumpfish 

70(69) 

22(13) 

Silver Hake 

70(58) 

Yellow Tail 

118(103) 


Table 1: Sample sizes of the different species of the fatty acid data. The number inside the 
parentheses shows how many observations have at least one zero element. 


Component 

1 st 

2 nd 

3rd 

4th 

5th 

6 th 

7th 

8 th 

9th 

10 th 

Percentage of zeros 

0 .00% 

0 .00% 

0 .00% 

6.54% 

0.28% 

9.86% 

9.10% 

4.88% 

65.36% 

2.94% 

Component 

11 th 

12 th 

13th 

14th 

15th 

16th 

17th 

18th 

19th 

20 th 

Percentage of zeros 

0 .00% 

0 .00% 

0 .00% 

0 .00% 

6.78% 

2.32% 

0.62% 

0 .00% 

3.51% 

0.05% 

Component 

21 st 

22 nd 

23rd 

24th 

25th 

26th 

27th 

28th 

29th 

30th 

Percentage of zeros 

2.65% 

0.09% 

0 .00% 

0.05% 

1.80% 

1 .66% 

0 .00% 

0.33% 

0.05% 

0 .00% 

Component 

31st 

32nd 

33rd 

34th 

35th 

36th 

37th 

38th 

39th 

40th 

Percentage of zeros 

0.33% 

0.5% 

0 .00% 

27.35% 

0 .00% 

10 .66% 

0 .00% 

8.91% 

0 .00% 

0 .00% 


Table 2: Fatty acid data: the percentage of observations for which each component is zero. 


vehicle window glass (17 observations, 16 with at least one zero value), window float glass (70 
observations, 69 with at least one zero value) and window non-float glass (76 observations, 72 
with at least one zero value). Once again the zeros rule out the use of LRA. In total there are 
392 zero values; Table [3] shows in which components these zeros arise and Table [6] summarises 
the distribution of zeros across the observations. For the cross validation we used a test set 
consisted of n tes t = 30 observations (14% of the total sample). 


Components 

Sodium 

Magnesium 

Aluminium 

Silicon 

Percentage of zeros 

0.00% 

19.63% 

0.00% 

0.00% 

Components 

Potassium 

Calcium 

Barium 

Iron 

Percentage of zeros 

14.02% 

0.00% 

82.24% 

67.29% 


Table 3: Forensic glass data: the percentage of observations for which each component is zero. 


Example 3: Hydrochemical data (contains no zero values) 


The hydrochemical data set (Otero et ah, 2005j) contains compositional observations on D = 14 
chemicals (H, Na, K, Ca, Mg, Sr, Ba, NH4, Cl, HC03, N03, S04, P04, TOC) in water samples 
from tributaries of the Llobregat river in north-east Spain. The n = 485 observations are in g = 4 
groups according to which tributary they were measured in: Anoia (143 observations), Cardener 
(95 observations), Upper Llobregat (135 observations) or Lower Llobregat (112 observations). 
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For the cross validation in this example we used a training set of size nt es t = 165 (34% of the 
total sample size). 


Example 4: National income data (contains no zero values) 


This final example is an economics data set (Larrosa, 2003) containing compositional observa¬ 
tions for n = 56 countries with D = 5 components reflecting the proportion of capital allocated 
in production assets, residential buildings, non-residential buildings, other buildings, and trans¬ 
portation equipment. The countries are categorised into g = 5 groups according to income levels 
and membership of the Organization for Economic Co-operation and Development (OECD); the 
groups are “low income” (10 countries), “lower middle income” (12 countries), “upper middle 
income” (9 countries), “high income and OECD member” (21 countries), and “high income and 
non-OECD member” (4 countries). For the cross validation, we used a test set of ntest = 10 
observations (17.9% of the total sample). 


4.2 Results 

This section contains results from applying the methods of [}3]to the four compositional datasets 
described above. Results are summarised in Figures |Tj and [2] and Tables HE The Tables show 
results for a = 1, a = 0, and for a free in [-1,1], in each case for the values of free parameters 
that maximise the estimated rate of correct classification. 


Example 1 (Fatty acid signature data) 


Method 

Estimated rate of 
correct classification 

Method 

Estimated rate of 
correct classification 

RDA(0.6,0.9, 0.7) 

0.962(0.014) 

RDA(1,0.8,1) 

0.949(0.016) 

LDA(0.45) 

0.897(0.022) 

LDA(l) 

0.868(0.024) 

2-NN(0.35) 

0.933(0.020) 

2-NN(l) 

0.849(0.027) 

2-NN esov 

0.921(0.019) 




Example 2 (Forensic glass data) 


Method 

Estimated rate of 
correct classification 

Method 

Estimated rate of 
correct classification 

RDA(0.95,0.1,1) 

0.643(0.034) 

RDA(1,0.1,1) 

0.643(0.034) 

LDA(0.4) 

0.629(0.034) 

LDA(l) 

0.629(0.034) 

3-NN(0.85) 

0.719(0.033) 

2-NN(l) 

0.719(0.033) 

3-NN esov 

0.693(0.033) 




Table 4: Estimated rate of correct classification of the different approaches. The standard error 
appears inside the parentheses. 


Fatty acid and glass data from Examples 1 and 2 

For both the fatty acid and forensic glass datasets, some of the groups have fewer observations 
than the dimension D of the compositions, so QDA cannot be applied (since at least one of the 
Sj in (11211 is singular). Both RDA, LDA and k -NN are applicable, however, and Table H] shows 
a comparison of performance for these techniques. 
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Example 1 (Fatty acid signature data) 





a values 

(VII) 



a values 

(VIII) 

Example 4 (National income data) 




Figure 1: All graphs contain the estimated rate of correct classification for the different methods. 
The first column refers to LDA, QDA and RDA as a function of a. The second column contains 
the heat plots of the k -NN algorithm as a function of a and k, the nearest neighbours. The 
graphs in the third column present the results of the fc-NN algorithm of the a-metric for some 
specific values of a. 
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0.6 0.7 0.8 0.9 1.0 

Proportion of zeros 


RDA(0.6, 0.9,0.7) 



Proportion of zeros 

LDA(0.45) 



0.6 0.7 0.8 0.9 1.0 


2-NN(0.35) 


Figure 2: Fatty acid signature data: the estimated rate of correct classification accuracy by 
group versus the the proportion of observations within the group that contain at least one zero. 



Number of zeros 

Method 

0 (12.27%) 

1 (45.69%) 

2 (2.61%) 

3 (9.38%) 

4-8 (10.5%) 

RDA(0.6, 0.90.71) 
RDA(1,0.8,1) 

0.956(0.043) 

0.951(0.045) 

0.963(0.021) 

0.962(0.022) 

0.963(0.030) 

0.941(0.037) 

0.976(0.040) 

0.942(0.063) 

0.949(0.053) 

0.911(0.066) 

LDA(0.45) 

LDA(l) 

0.875(0.065) 

0.882(0.060) 

0.925(0.029) 

0.898(0.034) 

0.881(0.051) 

0.842(0.053) 

0.872(0.091) 

0.822(0.099) 

0.855(0.093) 

0.813(0.096) 

2-NN(0.35) 

2-NN(l) 

0.923(0.054) 

0.844(0.075) 

0.938(0.025) 

0.853(0.036) 

0.923(0.039) 

0.853(0.058) 

0.963(0.047) 

0.874(0.082) 

0.922(0.064) 

0.803(0.100) 

2-NN esov 

0.918(0.062) 

0.928(0.030) 

0.913(0.046) 

0.962(0.050) 

0.880(0.086) 


Table 5: Fatty acid signature data: classification accuracy by number of zeros. The estimated 
rate of correct classification is shown (with standard errors in parentheses). 



Number of zeros 

Method 

0 (3.27%) 

1 (29.44%) 

2 (50.93%) 

3 (13.55%) 

4 (2.80%) 

RDA(0.95, 0.1,1) 
RDA(1,0.1,1) 

0.421(0.433) 

0.428(0.435) 

0.582(0.173) 

0.585(0.174) 

0 .668(0.111) 

0.665(0.112) 

0.787(0.233) 

0.788(0.233) 

0.402(0.463) 

0.397(0.459) 

LDA(0.4) 

LDA(l) 

0.404(0.431) 

0.397(0.430) 

0.536(0.165) 

0.523(0.162) 

0.636(0.120) 

0.673(0.110) 

0.869(0.194) 

0.790(0.230) 

0.689(0.412) 

0.463(0.462) 

3-NN(0.85) 

2-NN(l) 

0.307(0.394) 

0.568(0.441) 

0.713(0.160) 

0.715(0.160) 

0.717(0.108) 

0.712(0.114) 

0.925(0.146) 

0.870(0.178) 

0.387(0.429) 

0.387(0.429) 

3-NN esov 

0.477(0.447) 

0.644(0.164) 

0.764(0.097) 

0.731(0.243) 

0.387(0.429) 


Table 6: Forensic glass data: classification accuracy by number of zeros. The estimated rate of 
correct classification is shown (with standard errors in parentheses). 


For the fatty acid data, RDA performs strongest, and best performance is achieved when 
a = 0.6. For this dataset k-NN(a) performs strongly too, with a = 0.35 giving notably better 
performance than a = 1 (which corresponds to the EDA approach). For the forensic glass data, 
k -NN outperformed RDA and LDA, and the flexibility of having a different from 1 offered no 
improvement. 

For both of these datasets, results suggest that there is no clear relationship between classi¬ 
fication accuracy for the groups and the number of observations containing zeros, i.e., no clear 
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Example 3 (Hydrochemical data) 


Method 

Estimated 
rate of 

correct 

classification 

Method 

Estimated 
rate of 

correct 

classification 

Method 

Estimated 
rate of 

correct 

classification 

RDA(0.15,1,0) 

0.909(0.02) 

RDA(0,1,0) 

0.901(0.021) 

RDA(1,0.9, 0.9) 

0.793(0.029) 

QDA(0.15) 

0.909(0.02) 

QDA(0) 

0.901(0.021) 

QDA(l) 

- 

LDA(O) 

0.750(0.031) 

LDA(0) 

0.750(0.031) 

LDA(l) 

- 

2-NN(0.25) 

0.927(0.020) 

2-NN(0) 

0.855(0.026) 

2-NN(l) 

0.830(0.027) 

3-NN esov 

0.899(0.021) 






Example 4 (National income data) 


Method 

Estimated 
rate of 

correct 

classification 

Method 

Estimated 
rate of 

correct 

classification 

Method 

Estimated 
rate of 

correct 

classification 

RDA(-0.05, 0.5,0) 
QDA(—0.25) 
LDA(0.5) 

0.574(0.035) 

0.496(0.035) 

0.503(0.035) 

RDA(0, 0.5,0) 
QDA(0) 
LDA(0) 

0.574(0.035) 

0.487(0.035) 

0.488(0.035) 

RDA(1, 0.2,0) 
QDA(l) 
LDA(l) 

0.540(0.035) 

0.431(0.035) 

0.483(0.035) 

2-NN(—0.5) 

0.586(0.035) 

3-NN(0) 

0.533(0.035) 

3-NN(l) 

0.515(0.035) 

3-NN esov 

0.541(0.035) 






Table 7: Estimated rate of correct classification of the different approaches (with standard errors 
in parentheses). 


evidence that observations with zeros were more or less difficult to classify correctly than those 
without zeros. Figure [21 for example, shows the classification accuracy for each group in the 
fatty acid dataset plotted against the proportion of observations that contain at least one zero, 
and no clear correlation is apparent. Results (not shown) for the &-NN with the ESOV met¬ 
ric (glD similarly show little pattern. Table [5] shows results for the fatty acid data presented 
according to the number of zeros in the observations. There is no clear relationsip between 
classification accuracy and number of zeros. Corresponding results in Table [6] for the forensic 
glass data show lower classification accuracy for observations with 0 or 4 zeros compared with 
observations with 1, 2 or 3 zeros, albeit with large standard errors on account of the small 
number of such observations. Hence, again, the conclusion is that there is no clear evidence 
that zeros make observations any more or less difficult to classify correctly. 

The key points from these examples are that LRA is not directly applicable because of 
the zeros, but EDA (a = 1) performs quite well with RDA having better performance in one 
example and fc-NN in another, and in one of the examples letting a be a value other than 1 
gave a further improvement. 

Hydrochemical and national income data from Examples 3 and 4 

For the hydrochemical data the extra flexibility of RDA over QDA offers no improvement (and 
hence RDA and QDA give identical results). Ill-conditioning of covariance matrices makes QDA 
and LDA unstable for a > 0.75, which is why in Figure QJVII) the lines corresponding to these 
methods stop at a = 0.75. The plots in the left column of Figure [Tj show clearly that the 
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performance of RDA(cc, A, 7 ) (and its special cases QDA(a) and LDA(a)) depends on a and 


tend to do best at values of a other than 0 or 1. k-NN(a) does best for this example, with 


a = 0.25 and 2 nearest neighbours, leading to the best performance of all the classifiers. 

For the final example of the national income data, the LRA approach of taking a = 0 leads 
to the best performance of RDA. As in the previous example the k- NN classifier does best when 
a = —0.5 and 2 neighbours are considered. 

5 Conclusions 

We have considered the ct-transformation ([!]) and the a-metric Q as a means to adapt LDA, 
QDA, RDA and k- NN for compositional data. This generalises EDA and LRA approaches 
via the parameter a, the choice of which enable a compromise between the two. Rather than 
choosing either EDA or LRA, our approach enables a choice of a based on the dataset at hand, 
and numerical results suggest there is a clear benefit to having this flexibility. 

An important benefit is that such an approach is well defined even when the dataset contains 
observations with components equal to zero, unlike with LRA in which ad hoc modifications to 
the data are needed prior to applying the log-ratio transformation. Within k- NN it is simple 
to incorporate any choice of distance that seems appropriate. 

Appendix 

Relationship between the a-transformation and centred log-ratio transformation 

The proof that the transformation (hu„(x) — \d)/oi defined on the right-hand side of (JT]) tends 
to the centred log-ratio transformation ([7} as a —> 0 is as follows: for component i, 


a 


1 




D 


1 + a log Xi + O (a 2 ) 


1 

15 


D 


= log Xi - log xl D + O (a) 


-t log | 


log | 



The proof that the a-metric ([9]) tends to the LRA metric (1101) as a —>• 0 follows from this proof. 
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